Identifying Temporal Expression and its Syntactic Role Using FST and Lexical Data from Corpus
نویسندگان
چکیده
Accurate analysis of the temporal expression is crucial for Korean text processing applications such as information extraction and clmnking for efficient syntactic analysis. It is a complicated problem since temporal expressions often have the ambiguity of syntactic roles. This t)al)er discusses two problenm: (1) representing and identiflying the temporal expression (2) distinguishing the syntactic tim(lion of the temporal exI)ression in case it has a dual syntactic role. In this paper, temporal expressions and the context for disambiguation which is called local context are represented using lexical data extracted fiom corlms and the finite state transducer. By experiments, it; turns out that the method is eflimtive for temporal expression analysis. In particular, our al)t)roach shows the corI)us-based work could make a promising result for the t)roblem in a restricted domain in t, hat we can eflbctievely deal with a, large size of lexical data. 1 I n t r o d u c t i o n Accurate analysis of the temporal expression is crucial tbr text processing aplflications such as information extraction and for chunking for efficient syntactic analysis. In information extraction, a user might want to get a piece of information about an event. Typically, the event is related with (late or time,, which is represented by temporal expression. Chunking is helpflfl for efficient syntactic analysis by removing irrelevant intermediate constituents generated through parsing. It involves the task to divide sentences into non-overlatli)ing segments. As a result of chunking, parsing would be a problem of analysis inside chunks and between chunks (Yoon, et al., 1999). Chunking prevents the parser fl'om producing intermediate structures irrelevant to a final output, which makes the parser etticient without losing accuracy. Thus, it turns out that chunking is an essential stage tbr the application system like MT that should pursue both efficiency and precision. Korean, an agglutinative language, has welldeveloped flmctional words such as postposition and ending by which the grammatical fimction of a phrase is decisively determined. Besides, because it is a head final language and so the head always follows its complement, the chunking is relatively easy. However, we are also faced with an mnbiguity problem in chunking, which is often due to the temporal expression. This is because inany temporal nouns are used as the modifier of noun a n d vert) in a sentence. Let us consider the tbllowing examI)les: [Example] l a j inan(last) :l]('oFd'll,'llt(SlllillIler)
منابع مشابه
Lexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities
This paper investigates a special type of recurrent expressions, lexical bundles, defined as a sequence of three or more words that co-occur frequently in a particular register (Biber et al., 1999). Considering the importance of this group of multi-word sequences in academic prose, this study explores the forms and syntactic structures of three- and four-word bundles in English abstracts writte...
متن کاملA semantic tagger for the Finnish language
This paper reports on the current status and evaluation of a Finnish semantic tagger (hereafter FST), which was developed in the EU-funded Benedict Project. In this project, we have ported the Lancaster English semantic tagger (USAS) to the Finnish language. We have re-used the existing software architecture of USAS, and applied the same semantic field taxonomy developed for English to Finnish....
متن کاملTextuality of Idiomatic Expressions in Cameroon English
The meaning of an idiomatic expression cannot be transparently worked out from the meanings of its constituent words due to its figurative and unpredictable nature. Consequently, the syntactic composition and the structural paradigm of an idiomatic expression are supposed to be the same in every context. However, this is not the case in the institutionalized second language varieties of English...
متن کاملPoliteness Strategies and Politeness Markers in Email Request Sent by Iranian EFL Learners to Professors
This study attempts to investigate politeness strategies and politeness markers in email-request sent byIranian male and female EFL learners to professors. The comparison between strategies used by malesand females in email-request were also analyzed. 52 actual emails of M.A students of TEFL studying atAzad University consisted the data in this research. To analyze the corpus, politeness strate...
متن کاملProducing a Persian Text Tokenizer Corpus Focusing on Its Computational Linguistics Considerations
The main task of the tokenization is to divide the sentences of the text into its constituent units and remove punctuation marks (dots, commas, etc.). Each unit is a continuous lexical or grammatical writing chain that is an independent semantic unit. Tokenization occurs at the word level and the extracted units can be used as input to other components such as stemmer. The requirement to create...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000